Tornado code

In computer science, Tornado codes are a class of erasure codes that support error correction. Tornado codes require a constant C more redundant blocks than the more data-efficient Reed-Solomon erasure codes, but are much faster to generate and can fix erasures faster. Software-based implementations of tornado codes are about 100 times faster on small lengths and about 10,000 times faster on larger lengths than Reed-Solomon erasure codes.[1] Since the introduction of Tornado codes, many other similar erasure codes have emerged, most notably Online codes, LT codes and Raptor codes.

Tornado codes use a layered approach. All layers except the last use an LDPC error correction code, which is fast but has a chance of failure. The final layer uses a Reed-Solomon correction code, which is slower but is optimal in terms of failure recovery. Tornado codes dictates how many levels, how many recovery blocks in each level, and the distribution used to generate blocks for the non-final layers.

Contents

Overview

The input data is divided into blocks. Blocks are sequences of bits that are all the same size. Recovery data uses the same block size as the input data. The erasure of a block (input or recovery) is detected by some other means. (For example, a block from disk does not pass a CRC check or a network packet with a given sequence number never arrived.)

The number of recovery blocks is given by the user. Then the number of levels is determined along with the number of blocks in each level. The number in each level is determined by a factor B which is less than one. If there are N input blocks, the first recovery level has B*N blocks, the second has B*B*N, the third has B*B*B*N, and so on.

All levels of recovery except the final one use an LDPC, which works by xor (exclusive-or). Xor operates on binary values, 1s and 0s. A xor B is 1 if A and B have different values and 0 if A and B have the same values. If you are given (A xor B) and A, you can determine the value for B. (A xor B xor A = B) Similarly, if you are given (A xor B) and B, you can determine the value for A. This extends to multiple values, so given (A xor B xor C xor D) and any 3 of the values, the missing value can be recovered.

So the recovery blocks in level one are just the xor of some set of input blocks. Similarly, the recovery blocks in level two are each the xor of some set of blocks in level one. The blocks used in the xor are chosen randomly, without repetition. However, the number of blocks xor'ed to make a recovery block is chosen from a very specific distribution for each level.

Since xor is a fast operation and the recovery blocks are an xor of only a subset of the blocks in the input (or at a lower recovery level), the recovery blocks can be generated quickly.

The final level is a Reed-Solomon code. Reed-Solomon codes are optimal in terms of recovering from failures, but slow to generate and recover. Since each level has fewer blocks than the one before, the Reed-Solomon code has a small number of recovery blocks to generate and to use in recovery. So, even though Reed-Solomon is slow, it only has a small amount of data to handle.

During recovery, the Reed-Solomon code is recovered first. This is guaranteed to work if the number of missing blocks in the next-to-final level is less than the present blocks in the final level.

Going lower, the LDPC (xor) recovery level can be used to recover the level beneath it with high probability if all the recovery blocks are present and the level beneath is missing at most C' fewer blocks than the recovery level. The algorithm for recovery is to find some recovery block that has only one of its generating set missing from the lower level. Then the xor of the recovery block with all of the blocks that are present is equal to the missing block.

Patent issues

Tornado codes are patented inside the United States of America.[2]

Citations

Michael Luby created the Tornado codes.[3][4]

External links

A readable description from CMU (PostScript) [1] and another from Luby at the International Computer Science Institute (PostScript) [2].

Notes

  1. ^ A digital fountain approach to reliable distribution of bulk data. http://portal.acm.org/citation.cfm?id=285243.285258
  2. ^ (Mitzenmacher 2004)
  3. ^ (Luby 1997)
  4. ^ (Luby 1998)

References